Goto

Collaborating Authors

 review process



Reducing research bureaucracy in UK higher education: Can generative AI assist with the internal evaluation of quality?

Fletcher, Gordon, Khan, Saomai Vu, Fletcher, Aldus Greenhill

arXiv.org Artificial Intelligence

This paper examines the potential for generative artificial intelligence (GenAI) to assist with internal review processes for research quality evaluations in UK higher education and particularly in preparation for the Research Excellence Framework (REF). Using the lens of function substitution in the Viable Systems Model, we present an experimental methodology using ChatGPT to score and rank business and management papers from REF 2021 submissions, "reverse engineering" the assessment by comparing AI-generated scores with known institutional results. Through rigourous testing of 822 papers across 11 institutions, we established scoring boundaries that aligned with reported REF outcomes: 49% between 1* and 2*, 59% between 2* and 3*, and 69% between 3* and 4*. The results demonstrate that AI can provide consistent evaluations that help identify borderline evaluation cases requiring additional human scrutiny while reducing the substantial resource burden of traditional internal review processes. We argue for application through a nuanced hybrid approach that maintains academic integrity while addressing the multi-million pound costs associated with research evaluation bureaucracy. While acknowledging these limitations including potential AI biases, the research presents a promising framework for more efficient, consistent evaluations that could transform current approaches to research assessment.


How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review

Su, Buxin, Collina, Natalie, Wen, Garrett, Li, Didong, Cho, Kyunghyun, Fan, Jianqing, Zhao, Bingxin, Su, Weijie

arXiv.org Artificial Intelligence

Peer review in academic research aims not only to ensure factual correctness but also to identify work of high scientific potential that can shape future research directions. This task is especially critical in fast-moving fields such as artificial intelligence (AI), yet it has become increasingly difficult given the rapid growth of submissions. In this paper, we investigate an underexplored measure for identifying high-impact research: authors' own rankings of their multiple submissions to the same AI conference. Grounded in game-theoretic reasoning, we hypothesize that self-rankings are informative because authors possess unique understanding of their work's conceptual depth and long-term promise. To test this hypothesis, we conducted a large-scale experiment at a leading AI conference, where 1,342 researchers self-ranked their 2,592 submissions by perceived quality. Tracking outcomes over more than a year, we found that papers ranked highest by their authors received twice as many citations as their lowest-ranked counterparts; self-rankings were especially effective at identifying highly cited papers (those with over 150 citations). Moreover, we showed that self-rankings outperformed peer review scores in predicting future citation counts. Our results remained robust after accounting for confounders such as preprint posting time and self-citations. Together, these findings demonstrate that authors' self-rankings provide a reliable and valuable complement to peer review for identifying and elevating high-impact research in AI.



3D Printing Supplementary Material

Neural Information Processing Systems

Figure 1: The Slice-100K dataset consists of STL files and their G-code counterparts. However, we do foresee some potential negative societal impacts. We provide additional visualizations to understand the distribution of STL models in Slice-100K. Slicing: We utilize Prusa's Slicer for generating G-code from STL files. Finetuning implementation: For finetuning our translation model, we use a batch size of 32 with 8 gradient accumulation steps.


Insights from the ICLR Peer Review and Rebuttal Process

Kargaran, Amir Hossein, Nikeghbal, Nafiseh, Yang, Jing, Ousidhoum, Nedjma

arXiv.org Artificial Intelligence

Peer review is a cornerstone of scientific publishing, including at premier machine learning conferences such as ICLR. As submission volumes increase, understanding the nature and dynamics of the review process is crucial for improving its efficiency, effectiveness, and the quality of published papers. We present a large-scale analysis of the ICLR 2024 and 2025 peer review processes, focusing on before- and after-rebuttal scores and reviewer-author interactions. We examine review scores, author-reviewer engagement, temporal patterns in review submissions, and co-reviewer influence effects. Combining quantitative analyses with LLM-based categorization of review texts and rebuttal discussions, we identify common strengths and weaknesses for each rating group, as well as trends in rebuttal strategies that are most strongly associated with score changes. Our findings show that initial scores and the ratings of co-reviewers are the strongest predictors of score changes during the rebuttal, pointing to a degree of reviewer influence. Rebuttals play a valuable role in improving outcomes for borderline papers, where thoughtful author responses can meaningfully shift reviewer perspectives. More broadly, our study offers evidence-based insights to improve the peer review process, guiding authors on effective rebuttal strategies and helping the community design fairer and more efficient review processes. Our code and score changes data are available at https://github.com/papercopilot/iclr-insights.




3D Printing Supplementary Material

Neural Information Processing Systems

Figure 1: The Slice-100K dataset consists of STL files and their G-code counterparts. However, we do foresee some potential negative societal impacts. We provide additional visualizations to understand the distribution of STL models in Slice-100K. Slicing: We utilize Prusa's Slicer for generating G-code from STL files. Finetuning implementation: For finetuning our translation model, we use a batch size of 32 with 8 gradient accumulation steps.


We thank the reviewers for their encouraging and instructive comments, and the AC for guiding the review process

Neural Information Processing Systems

We thank the reviewers for their encouraging and instructive comments, and the AC for guiding the review process. Gray (2013), and may look a bit too complicated. We will add a remark in line with our comment above. Note that the assumption on encoder gap is very mild. R2: It is not clear that sparsity-promoting encoders are the right models to be studying. Ours is the first work to address this.